Goto

Collaborating Authors

 correlation decay



Reviews: Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Neural Information Processing Systems

The major contributions of this paper are that it proves the global convergence of BP(Theorem 1.3) and VI(Theorem 1.2) on ferromagnetic Ising model with a specific initialization, i.e., to initialize variables to be 1. The proof of Theorem 1.2 is based on the fact that the mean-field free energy function, i.e., \Phi(x) is concave on the set S obtained by the update rule, and then we can use Holder's inequality to expand the \Phi(x*) - Phi(x_t) and get the upper bounds. The proof of Theorem 1.3 is based on the fact that the norm of \Phi(v)'s gradient is less than 1(Lemma 3.2), and the properties of variable \mu sandwiched between v 0 and final v T(Lemma 3.5 and Lemma F.1). Other minor contributions include that it provides examples to empirically show the convergence(appendix G) and it shows how to use ellipsoid method to optimize the beliefs(appendix H). I have to admit that I am not familiar with this area, so can only go through a part of the proof, and I am not able to evaluate the originality and quality of this work.


Reviews: Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Neural Information Processing Systems

The reviewers liked the results on convergence of belief propagation algorithms for Ising models under certain settings. As a presentational suggestion, they suggest providing more extensive proof sketches in the main section of the paper.


Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Neural Information Processing Systems

Belief propagation is a fundamental message-passing algorithm for probabilistic reasoning and inference in graphical models. While it is known to be exact on trees, in most applications belief propagation is run on graphs with cycles. Understanding the behavior of loopy'' belief propagation has been a major challenge for researchers in machine learning, and several positive convergence results for BP are known under strong assumptions which imply the underlying graphical model exhibits decay of correlations. We show that under a natural initialization, BP converges quickly to the global optimum of the Bethe free energy for Ising models on arbitrary graphs, as long as the Ising model is \emph{ferromagnetic} (i.e. This holds even though such models can exhibit long range correlations and may have multiple suboptimal BP fixed points.


Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Neural Information Processing Systems

Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay.


Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Neural Information Processing Systems

Belief propagation is a fundamental message-passing algorithm for probabilistic reasoning and inference in graphical models. While it is known to be exact on trees, in most applications belief propagation is run on graphs with cycles. Understanding the behavior of loopy'' belief propagation has been a major challenge for researchers in machine learning, and several positive convergence results for BP are known under strong assumptions which imply the underlying graphical model exhibits decay of correlations. We show that under a natural initialization, BP converges quickly to the global optimum of the Bethe free energy for Ising models on arbitrary graphs, as long as the Ising model is \emph{ferromagnetic} (i.e. This holds even though such models can exhibit long range correlations and may have multiple suboptimal BP fixed points.


How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods

arXiv.org Machine Learning

We consider the problem of approximating partition functions for Ising models. We make use of recent tools in combinatorial optimization: the Sherali-Adams and Lasserre convex programming hierarchies, in combination with variational methods to get algorithms for calculating partition functions in these families. These techniques give new, non-trivial approximation guarantees for the partition function beyond the regime of correlation decay. They also generalize some classical results from statistical physics about the Curie-Weiss ferromagnetic Ising model, as well as provide a partition function counterpart of classical results about max-cut on dense graphs \cite{arora1995polynomial}. With this, we connect techniques from two apparently disparate research areas -- optimization and counting/partition function approximations. (i.e. \#-P type of problems). Furthermore, we design to the best of our knowledge the first provable, convex variational methods. Though in the literature there are a host of convex versions of variational methods \cite{wainwright2003tree, wainwright2005new, heskes2006convexity, meshi2009convexifying}, they come with no guarantees (apart from some extremely special cases, like e.g. the graph has a single cycle \cite{weiss2000correctness}). We consider dense and low threshold rank graphs, and interestingly, the reason our approach works on these types of graphs is because local correlations propagate to global correlations -- completely the opposite of algorithms based on correlation decay. In the process we design novel entropy approximations based on the low-order moments of a distribution. Our proof techniques are very simple and generic, and likely to be applicable to many other settings other than Ising models.


Structure learning of antiferromagnetic Ising models

Neural Information Processing Systems

In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. Our first result is an unconditional computational lower bound of $\Omega (p^{d/2})$ for learning general graphical models on $p$ nodes of maximum degree $d$, for the class of statistical algorithms recently introduced by Feldman et al. The construction is related to the notoriously difficult learning parities with noise problem in computational learning theory. Our lower bound shows that the $\widetilde O(p^{d+2})$ runtime required by Bresler, Mossel, and Sly's exhaustive-search algorithm cannot be significantly improved without restricting the class of models. Aside from structural assumptions on the graph such as it being a tree, hypertree, tree-like, etc., most recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari showed that all known low-complexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the \emph{opposite} behavior: very strong repelling allows efficient learning in time $\widetilde O(p^2)$. We provide an algorithm whose performance interpolates between $\widetilde O(p^2)$ and $\widetilde O(p^{d+2})$ depending on the strength of the repulsion.


Learning loopy graphical models with latent variables: Efficient methods and guarantees

arXiv.org Artificial Intelligence

The problem of structure estimation in graphical models with latent variables is considered. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider models where the underlying Markov graph is locally tree-like, and the model is in the regime of correlation decay. For the special case of the Ising model, the number of samples $n$ required for structural consistency of our method scales as $n=\Omega(\theta_{\min}^{-\delta\eta(\eta+1)-2}\log p)$, where p is the number of variables, $\theta_{\min}$ is the minimum edge potential, $\delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $\eta$ is a parameter which depends on the bounds on node and edge potentials in the Ising model. Necessary conditions for structural consistency under any algorithm are derived and our method nearly matches the lower bound on sample requirements. Further, the proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph.


Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Neural Information Processing Systems

Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency when the number of samples $n$ scales as $n = \Omega(\theta_{\min}^{-\delta \eta(\eta+1)-2}\log p)$, where $\theta_{\min}$ is the minimum edge potential, $\delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $\eta$ is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements.